Estimating Haplotype Frequency and Coverage of Databases

نویسندگان

  • Thore Egeland
  • Antonio Salas
چکیده

A variety of forensic, population, and disease studies are based on haploid DNA (e.g. mitochondrial DNA or Y-chromosome data). For any set of genetic markers databases of conventional size will normally contain only a fraction of all haplotypes. For several applications, reliable estimates of haplotype frequencies, the total number of haplotypes and coverage of the database (the probability that the next random haplotype is contained in the database) will be useful. We propose different approaches to the problem based on classical methods as well as new applications of Principal Component Analysis (PCA). We also discuss previous proposals based on saturation curves. Several conclusions can be inferred from simulated and real data. First, classical estimates of the fraction of unseen haplotypes can be seriously biased. Second, there is no obvious way to decide on required sample size based on traditional approaches. Methods based on testing of hypotheses or length of confidence intervals may appear artificial since no single test or parameter stands out as particularly relevant. Rather the coverage may be more relevant since it indicates the percentage of different haplotypes that are contained in a database; if the coverage is low, there is a considerable chance that the next haplotype to be observed does not appear in the database and this indicates that the database needs to be expanded. Finally, freeware and example data sets accompany the methods discussed in this paper: http://folk.uio.no/thoree/nhap/.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مقایسۀ مدخل‌های استانداردهای فراداده‌ای در پایگاه‌های نسخه‌های خطی فارسی با مدخل‌های استانداردهای فراداده‌ای در پایگاه‌های خارج از ایران در پوشش مدخل‌های نسخه‌های خطی

Purpose: The present research aims at studying the use of metadata standards in Persian manuscripts databases, and the types and frequencies of these standards in the Optical Character Recognition (OCR) procedure of these databases. Methodology: Research population consists of four Persian databases and 12 Latin databases. The research data is gathered through a checklist, using descriptive su...

متن کامل

Application of Satellite Data and Data Mining Algorithms in Estimating Coverage Percent (Case study: Nadoushan Rangelands, Ardakan Plain, Yazd, Iran)

Assessing and monitoring rangelands in arid regions are important and essential tasks in order to manage the desired regions. Nowadays, satellite images are used as an approximately economical and fast way to study the vegetation in a variety of scales. This research aims to estimate the coverage percent using the digital data given by ETM+ Landsat satellite. In late May and early Ju...

متن کامل

A Recommendation for Net Undercount Estimation in Iran Population and Dwelling Censuses

Census counts are subject to different types of nonsampling errors. One of these main errors is coverage error. Undercount and overcount are two types of coverage error. Undercount usually occurs more than the other, thus net undercount estimation is important. There are various methods for estimating the coverage error in censuses. One of these methods is dual system (DS) that usually uses dat...

متن کامل

Accurate estimation of haplotype frequency from pooled sequencing data and cost-effective identification of rare haplotype carriers by overlapping pool sequencing

MOTIVATION A variety of hypotheses have been proposed for finding the missing heritability of complex diseases in genome-wide association studies. Studies have focused on the value of haplotype to improve the power of detecting associations with disease. To facilitate haplotype-based association analysis, it is necessary to accurately estimate haplotype frequencies of pooled samples. RESULTS ...

متن کامل

فراوانی هاپلوتایپ G691S/S904S پروتوآنکوژن RET در بیماران مبتلا به سرطان مدولاری تیرویید در جمعیت ایرانی

Background: Medullary thyroid carcinoma (MTC) occurs in both sporadic (75%) and hereditary (25%) forms. The missense mutations of the rearranged during transfection (RET) proto-oncogene in MTC development have been well demonstrated. Several studies have been published that indicate the molecular analysis of RET gene may offer early identification of those patients at high risk to develop MTC a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PLoS ONE

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2008